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Abstract 

The optimization problem that arises out of the least median of 
squared residuals method in linear regression is analyzed. To simplify 
the analysis, the problem is replaced by an equivalent one of minimiz- 
ing the median of absolute residuals. A useful representation of the 
last problem is given to examine properties of the objective function 
and estimate the number of its local minima. It is shown that the 
exact number of local minima is equal to (p+L(«-i)/2J) ; where p is thc 
dimension of the regression model and n is the number of observations. 
As applications of the results, three algorithms are also outlined. 



1 Introduction 

The least median of squares (LMS) method has recently been proposed by 
Rousseeuw in pQ to provide a very robust estimate of parameters in linear 
regression problems. The LMS estimate can be obtained as the solution of 
the following optimization problem. 

Let xj = Oil, • • -,Xi p ), i = 1, . . . ,n, and y = (y u . . . ,y n ) T be given 
real vectors. We assume that n/2 > p and the (n x p) -matrix X = [xij] 
is of full rank to avoid degenerate cases. Let 6 = (9±, . . . , 6 P ) T be a vector 
of regression parameters. The optimization problem that arises out of the 
LMS method is to find 9* providing 

min med {{yi — xj9) 2 }. (1) 
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It is known (see [2 [31 [U 0]) that the objective function in (pQ) is hard to 
minimize. This function is multi-extremal, it is considered as having 0{n p ) 
local minima. In fact, there are efficient and exact algorithms available only 
for the problems of the lowest dimensions. A simple algorithm for p = 1 
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can be found in pQ. Another two designed for the problems of the dimension 
p = 2 have been described in [2] and [1]. For other dimensions, there are 
probabilistic algorithms producing approximate solutions of the problem 
(see [5] and [3]). 

The purpose of this paper is to present some new ideas concerning the 
LMS problem so as to provide some theoretical framework for efficient re- 
gression algorithms. In Section 2 we offer a useful representation of the 
problem. The representation is exploited in Section 3 to demonstrate prop- 
erties of the objective function and estimate the number of its local minima. 
Section 4 includes our main result providing the exact number of local min- 
ima. Finally, in Section 5 we briefly outline three LMS regression algorithms 
based on the above results. 



2 Representation of the LMS Problem 

To produce our representations, we first replace ([1]) by an equivalent problem 
just examined below. Obviously, the solutions of ([T]) are exactly the same 
as those of the problem: 

min med {\t)i — xj8\}. (2) 
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A serious difficulty one meets in analyzing both problems (H|) and ([2]) 
is that it is hard to understand how the median behaves as the objective 
function. The next result offers a useful representation for the median as 
well as for other operators defined by means of ordering. 

Let R = {ri,...,r n } be a finite set of real numbers. Suppose that 
we arrange its elements in order of increase, and denote the kth smallest 
element by ro.) . If there are elements of the equal value, we count them 
repeatedly in an arbitrary order. 

Lemma 1. For each k = 1, . . . , n, the value of is given by 

r(/u) = min max rj, (3) 

where 9^ is the set of all k -subsets of the set N = {1, . . . , n} . 

Proof. Denote the set of indices of the first k smallest elements by I* . 
It is clear that rt^) = maxj g /* r% ■ Consider an arbitrary subset I E Ssfe . 
Obviously, if I 7^ /* , there is at least one index j £ I such that rj > rn^\ . 
Therefore, we have r^ < maxjg/ rj . It remains to take minimum over all 
/ G 9fc in the last inequality so as to get ([3|). □ 

Let h = [n/2\ + 1, where [n/2\ is the largest integer less than or equal 
to n/2. For simplicity, we assume medjgjvrj = rrM . (It is absolutely correct 
to define the median in this form if n is odd. However, for an even n, it 
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is normally defined as jOv/i-i) + r (h))-) By using Q with k = h and 
r i = r i{@) = \y% — xj9\ , we may now rewrite ([2]) as follows: 

min min max — xj 6\. (4) 

e i69 fc iel 1 1 1 v 1 

The obtained representation seems to be more useful than the original 
because it is based on the well-known functions max and min. Moreover, 
the representation allows of further reducing the problem. In particular, one 
may change the order of the operations of taking minimum in (jH) and get 

min min max |y,- — xj 6\. (5) 
i& h 9 iel |y 1 1 y ' 

Assume / to be a fixed subset of N . Consider the problem 

P(I) : min max \y^ — xj0\. (6) 

This is the well-known problem of fitting a linear function according to the 
loo -criterion, first examined by Fourier in the early 19th century [B]. The 
method proposed by Fourier was actually a version of the simplex algorithm 
and therefore (JBJ) may be regarded as one of the oldest problems in linear 
programming. For modern methods and ideas, one can be referred to [7j. 
Incidentally, by applying an additional variable p, we may shape (JUJ) into a 
usual form of linear programming problems: 

min p 

subject to p — x] 9 > —yi, p + xj 6 > yi, i G J. (7) 

To conclude this section, note that ([S]may be regarded as a "two-stage" 
problem of both combinatorial optimization and linear programming. It 
consists in minimizing a function defined on a discrete set by solving some 
linear programming problem. 



3 An Analysis of the Objective Function 

In this section we examine properties of the objective function in @, 

F{9) = min max \yi — xj 6\. (8) 

The main question we will try to answer is how many local minima it can 
have. To start the discussion, consider the function Qi{9) = maxj G / \yi — 
xjQ\, I C N . It is a piecewise linear and convex function bounded below. 
Clearly, the problem of minimizing Qi{6) always has the solution. 

The function qi{6) can be portrayed as the surface of a convex polyhe- 
dron in a (p + 1) -dimensional space. It is not difficult to see that function 
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([8]), which one may now express as F{9) = min/ g Q^ qi(9), also allows of 
visualizing its graph as the surface of some polyhedron. It is that produced 
by taking union of the polyhedra associated with qi{9) , for all I 6 Qf^ . Note 
that F{9) is still piecewise linear, but fails to be convex. An illustration for 
p = 1 and n = 5 is given in Figure [TJ 




Figure 1: An objective function plot. 



The objective function in Figure [T] is multi-extremal, it has three local 
minima. It is clear that for practical problems, the number of the local 
minima of (JHJ) can be enormous. To take the first step to determining this 
number, we may conclude from representation © that it must not be greater 
than the number of problems P(I) for all / € 3^. This last is equal to (v) , 
i.e. the number of all h -subsets I S 

Suppose 9* to be the solution of a problem P(I), \I\ > p + 1. One 
can state the condition for the function qi(9) to have the minimum at 9* 
(see [7]): it is necessary that there exist a (p + 1) -subset Pel and real 
numbers Aj to satisfy 

£}ViXi = 0, ^A; = l, \i>0, iel*, (9) 
ie/* ieJ* 

for some £j € {— 1, 1}. In other words, 9* is defined by the point of inter- 
section of p+ 1 "active" hyperplanes p + EixJ 9 = Eiyi for some choice of 
£j G { — 1,1}, i G I*, provided that the intersection point is an "acute" top 
of the corresponding polyhedron. 
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On the other hand, for any (p + 1) -subset of indices, we are always able 
to choose both Aj and £j suitable to satisfy ([9]). To illustrate this, let us 
examine an arbitrary (p + 1) -subset. Without loss of generality, we assume 
it to be {1, . . . ,p + l} . Consider the equation J2i=i ti x i = — tp+iXp+i, and 
set tp+i = 1 in it. Since rank(X) = p, we may obtain values of tx,...,t p 
as the unique solution of the above equation. For every i = 1, . . . ,p + 1, we 
define A* = \U\/Y%ti e i = sign(^). Obviously, Aj, i = 1, . . . ,p + 1, 
are just those required in ([9]). 

As we have shown, the solution of any problem P(I) is determined by 
p + 1 vectors Xi . Conversely, any p + 1 vectors X{ produce only one point 
which satisfies the necessary condition ([9]) and can therefore be treated as 
the solution of some problem. Clearly, the number of the local minima of 
F{9) must not be greater than the number of such points, equaled (p^) • 
Since we assume that p < n/2, our first estimate (^) can be improved by 
replacing it by • Although the last estimate is still rough, yet it is 

much lower than the quantity rtP considered in [2j [31 H] as the order of the 
number of local minima. 



4 The Exact Number of Local Minima 

We may now present our main result providing us with the exact number 
of local minima in In fact, it allows of determining the number of local 
minima for any function of the absolute residuals \yi — xj9\, i £ N , defined 
by using representation ^j. 

For each k = 0, 1, . . . , n — (p + 1), let us introduce the function 

f k (6) = min max ^ - xj9\, (10) 

and denote the number of its local minima by M/% . It should be noted that 
we have to set k = n — h = \ in (fTUj) to produce the objective function 
of problem 

Theorem 1. For each k = 0, 1, . . . , n — (p + 1) , it holds 

M t =( p ;*). ail 

Sketch of the proof. Let II be the set of problems P(I) for all I C N, \I\ > 
p + 1. To prove the theorem, we express |U| , i.e. the number of all the 
problems in II, in two ways. Firstly, it is easy to see that this number may 
be calculated as the sum 

'^o + (i) + - + ( n -(; + i»)^T(")' (i2) 
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To produce the second representation, we examine a local minimum of 
the function fk{&) for an arbitrary k, < k < n — (p + 1) . Assume 9* 
to be the point of the local minimum. It is clear that 8* = 9* (I) is the 
solution of some problem P(I) , where |/| = n — k. Since 9* is actually 
determined by a subset I* C I, which consists of p + 1 "active" indices, it 
is also the solution of problems P{I \ J) for all J C I \I* . The number 
of the problems having the solution at 9* coincides with the number of all 
subsets of I\I* including the empty set 0, and equals 2 n ~( p+1 )~ fc . In that 
case, the total number of the problems connected with the local minima of 
f k {9) is 2"-(f +1 )- fc M fc . 

Now we may express |II| in the form: 

n-(p+l) 

|jj| =2 «-(p+i)M + 2 n -( p+1 )- 1 Mi + ... + M n _ (p+1) = ]T 2"-( p+1 )-' J M i . 



j=0 



(13) 



From (|12p and (|13p . we have 

n-(p+l) n— (p+1) 



jr 2"-(p +i )^M J = jr ( n \ (i4) 

3=0 j=0 ^ J ' 

It is not difficult to understand that for a fixed k, < k < n — (p + 1), 
the number depends on p, but does not on n. One can consider Mq 
as an illustration. Because the problem P(N) has the unique solution (see 
[7]), Mo is always equal to 1. Also, it holds Mi = p + 1 independently 
on n. To see this, note that every one of the local minima of fi(9) can be 
produced by relaxing only one of p + 1 " active" constraints at the minimum 
point of fo(9). 

Setting n = p + l,p + 2, p + 3, . . . in (fT4"|) . we may successively get 



M = 1, Mi = p + 1, M 2 = (p+1) 2 (p+2) , .... It is not difficult to verify that 
the general solution of (fT4^) is represented as (fTT|) . □ 

Finally, substituting k = \^-^-\ into (fTT|h we conclude that the objec- 
tive function of the LMS problem has ( p+ L( n_1 )/ 2 J) local minima. 



5 Applications 

In this section we briefly outline LMS regression algorithms based on the 
above analysis of the problem. Only the main ideas that underlie the algo- 
rithms are presented. 



"Greedy" algorithm. The algorithm produces an approximate solution 
and consists of solving the sequence of problems ([6]), P(Iq), P(h), ■ ■ ■ , P(In-h) , 
where Iq = N and the sets Ji, I2,. . . ,I n -h are defined as follows. Let 1^ 
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be the set of p + 1 "active" indices for the solution of a problem P(Jfe). 
Clearly, for each i € 1%. , the minimum of the objective function in the prob- 
lem P{Ik \ {i}) is at least no greater than that in P{Ik). Denote by it the 
index that yields the problem having the lowest solution. Finally, we define 
4+1 = h\{i* k }- 

The "greedy" algorithm formally requires solving (n — h) x (p + 1) + 
1 optimization problems. In practice, however, an efficient procedure of 
transition between points, which yields the solutions of the problems, may 
be designed to avoid solving each of them. 

Exhaustive search algorithm. This algorithm may be considered as the 
complete version of the previous one which actually uses a reduced search 
procedure. It exploits the classical depth-first search technique to provide all 
local minima of the objective function. From Theorem [H one can conclude 
that it requires examining points to produce the exact solution. 

Because of its exponential time complexity, this search algorithm can hardly 
be applied to problems of high dimensions. Note, however, that it normally 
allows of solving problems with p < 5 within reasonable time. 

Branch and probability bound algorithm. It is a random search algo- 
rithm based on the Branch and Probability Bound (BPB) technique which 
has been developed in [5] as an efficient tool for solving both continuous and 
discrete optimization problems. The BPB algorithm designed to solve the 
LMS problem is of combinatorial optimization. It produces an approximate 
solution by searching over {p + 1) -subsets of N . As it follows from Sec- 
tion 3, each (p + 1) -subset determines a point satisfying the condition Q, 
one of such points is the solution of the LMS problem. 

In conclusion, I would like to thank Professor A. A. Zhigljavsky for draw- 
ing my attention to the problem and for valuable discussions, and Professor 
A.C. Atkinson for his kind interest in this work as well as for providing me 
with a reprint of paper [5]. 
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